Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Ensure last-use copy omission candidates are marked with GTF_GLOB_REF #89088

Merged
merged 5 commits into from
Jul 19, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 18, 2023

Last-use copy omission allows the JIT to avoid creating copies of struct values that are passed implicitly by-ref when the value is a last-use of a local. When we do this we effectively address expose the local for the lifetime of the call, so we mark the local as address exposed as part of doing so.

However, there is a problem where morph may reorder two uses of a such a local and break the last use information. For example, consider the following case:

[000015] --CXG------CALL      void   Program:Foo(int,int)
[000010] ----------- arg0 ├──▌  LCL_FLD   int    V00 loc0         [+0]
[000012] --CXG------ arg1 └──▌  CALL      int    Program:Bar(S):int
[000011] ----------- arg0    └──▌  LCL_VAR   struct<S, 32> V00 loc0          (last use)

V00 is used both at [000010] and at [000011], the latter as a last use. Since we only address expose V00 when we get to [000011] we do not mark [000010] with GTF_GLOB_REF; the net effect is the following call args morphing, where we have reordered the two occurrences illegally:

[000015] --CXG+-----CALL      void   Program:Foo(int,int)
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Program:Bar(S):int
[000011] -----+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000010] -----+----- arg0 in rcx └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]

This change fixes the problem by running a separate pass over the IR before morph to identify the candidates for last-use copy omission. These candidates then have GTF_GLOB_REF added as part of morph. The net result is then the following correct IR:

[000015] --CXG+-----CALL      void   Runtime_85611:Foo(int,int)
[000039] DA--G------ arg0 setup  ├──▌  STORE_LCL_VAR int    V05 tmp4
[000010] ----G+-----             │  └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Runtime_85611:Bar(Runtime_85611+S):int
[000011] ----G+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000040] ----------- arg0 in rcx └──▌  LCL_VAR   int    V05 tmp4

I first started out with a fix that simply address exposed the candidates in the pass, but this has some semi-large regressions (diffs) since it disables lots of optimizations that morph will do.
It's still not ideal since address exposing the local late just has the effect of making these optimizations depend on the block visit order; for .NET 9 I want to consider a fix where we instead do this optimization in the backend.

Fix #85611

Tiny number of diffs are expected with positive TP diffs in minopts due to the removal of fgRemarkGlobalUses. The TP impact of the pass is minor due to the use of GTF_CALL and the locals linked list to exit very early (about 0.03%, less than what is gained by the removal of fgRemarkGlobalUses).

Last-use copy omission allows the JIT to avoid creating copies of struct
values that are passed implicitly by-ref when the value is a last-use of
a local. When we do this we effectively address expose the
local for the lifetime of the call, so we mark the local as address
exposed as part of doing so.

However, there is a problem where morph may reorder two uses of a such a
local and break the last use information. For example, consider the
following case:

```
[000015] --CXG------      ▌  CALL      void   Program:Foo(int,int)
[000010] ----------- arg0 ├──▌  LCL_FLD   int    V00 loc0         [+0]
[000012] --CXG------ arg1 └──▌  CALL      int    Program:Bar(S):int
[000011] ----------- arg0    └──▌  LCL_VAR   struct<S, 32> V00 loc0          (last use)
```

V00 is used both at [000010] and at [000011], the latter as a last use.
Since we only address expose V00 when we get to [000011] we do not mark
[000010] with GTF_GLOB_REF; the net effect is the following call args
morphing, where we have reordered the two occurrences illegally:

```
[000015] --CXG+-----             ▌  CALL      void   Program:Foo(int,int)
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Program:Bar(S):int
[000011] -----+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000010] -----+----- arg0 in rcx └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]
```

This change fixes the problem by running a separate pass over the IR
before morph to identify and address expose the candidates for last-use
copy omission. The net result is then the following correct IR:

```
[000015] --CXG+-----             ▌  CALL      void   Runtime_85611:Foo(int,int)
[000039] DA--G------ arg0 setup  ├──▌  STORE_LCL_VAR int    V05 tmp4
[000010] ----G+-----             │  └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Runtime_85611:Bar(Runtime_85611+S):int
[000011] ----G+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000040] ----------- arg0 in rcx └──▌  LCL_VAR   int    V05 tmp4
```

The pass has some TP impact but it is mitigated since we only need to
visit statements with GTF_CALL.

Fix dotnet#85611
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 18, 2023
@ghost ghost assigned jakobbotsch Jul 18, 2023
@ghost
Copy link

ghost commented Jul 18, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Last-use copy omission allows the JIT to avoid creating copies of struct values that are passed implicitly by-ref when the value is a last-use of a local. When we do this we effectively address expose the local for the lifetime of the call, so we mark the local as address exposed as part of doing so.

However, there is a problem where morph may reorder two uses of a such a local and break the last use information. For example, consider the following case:

[000015] --CXG------CALL      void   Program:Foo(int,int)
[000010] ----------- arg0 ├──▌  LCL_FLD   int    V00 loc0         [+0]
[000012] --CXG------ arg1 └──▌  CALL      int    Program:Bar(S):int
[000011] ----------- arg0    └──▌  LCL_VAR   struct<S, 32> V00 loc0          (last use)

V00 is used both at [000010] and at [000011], the latter as a last use. Since we only address expose V00 when we get to [000011] we do not mark [000010] with GTF_GLOB_REF; the net effect is the following call args morphing, where we have reordered the two occurrences illegally:

[000015] --CXG+-----CALL      void   Program:Foo(int,int)
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Program:Bar(S):int
[000011] -----+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000010] -----+----- arg0 in rcx └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]

This change fixes the problem by running a separate pass over the IR before morph to identify the candidates for last-use copy omission. These candidates then have GTF_GLOB_REF added as part of morph. The net result is then the following correct IR:

[000015] --CXG+-----CALL      void   Runtime_85611:Foo(int,int)
[000039] DA--G------ arg0 setup  ├──▌  STORE_LCL_VAR int    V05 tmp4
[000010] ----G+-----             │  └──▌  LCL_FLD   int   (AX) V00 loc0         [+0]
[000037] DACXG------ arg1 setup  ├──▌  STORE_LCL_VAR int    V04 tmp3
[000012] --CXG+-----             │  └──▌  CALL      int    Runtime_85611:Bar(Runtime_85611+S):int
[000011] ----G+----- arg0 in rcx │     └──▌  LCL_ADDR  long   V00 loc0         [+0]
[000038] ----------- arg1 in rdx ├──▌  LCL_VAR   int    V04 tmp3
[000040] ----------- arg0 in rcx └──▌  LCL_VAR   int    V05 tmp4

I first started out with a fix that simply address exposed the candidates in the pass, but this has some semi-large regressions (diffs) since it disables lots of optimizations that morph will do.
It's still not ideal since address exposing the local late just has the effect of making these optimizations depend on the block visit order; for .NET 9 I want to consider a fix where we instead do this optimization in the backend.

Fix #85611

Tiny number of diffs are expected with positive TP diffs in minopts are expected due to the removal of fgRemarkGlobalUses. The TP impact of the pass is minor due to the use of GTF_CALL and the locals linked list to exit very early (about 0.03%, less than what is gained by the removal of fgRemarkGlobalUses).

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch reopened this Jul 18, 2023
@jakobbotsch
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 89088 in repo dotnet/runtime

@jakobbotsch
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review July 19, 2023 15:52
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

The failure is #88507.

Small number of diffs, with a small TP impact (sometimes positive, sometimes negative).

@jakobbotsch jakobbotsch merged commit 3b46cf5 into dotnet:main Jul 19, 2023
132 of 135 checks passed
@jakobbotsch jakobbotsch deleted the fix-85611 branch July 19, 2023 16:24
@ghost ghost locked as resolved and limited conversation to collaborators Aug 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fuzzlyn] Failure for another complex expression
3 participants